Variable Selection in Logistic Regression: The British English Dative Alternation
نویسنده
چکیده
In this paper, we address the problem of selecting the ‘optimal’ variable subset in a logistic regression model for a medium-sized data set. As a case study, we take the British English dative alternation, where speakers and writers can choose between two (equally grammatical) syntactic constructions to express the same meaning. With the help of 29 explanatory variables taken from the literature, we build two types of models: (1) with the verb sense included as a random effect (verb senses often have a bias towards one of the two variants), and (2) without a random effect. For each type, we build three different models by including all variables and keeping the significant ones, by sequentially adding the most predictive variable (forward regression), and by sequentially removing the least predictive variable (backward regression). Seeing that the six approaches lead to five different models, we advise researchers to be careful to base their conclusions solely on the one ‘optimal’ model they found.
منابع مشابه
Evaluating automatic annotation: automatically detecting and enriching instances of the dative alternation
In this article, we automatically create two large and richly annotated data sets for studying the English dative alternation. With an intrinsic and an extrinsic evaluation, we address the question of whether such data sets that are obtained and enriched automatically are suitable for linguistic research, even if they contain errors. The extrinsic evaluation consists of building logistic regres...
متن کاملChoosing alternatives: Using Bayesian Networks and memory-based learning to study the dative alternation
In existing research on syntactic alternations such as the dative alternation, (give her the apple vs. give the apple to her), the linguistic data is often analysed with the help of logistic regression models. In this article, we evaluate the use of logistic regression for this type of research, and present two different approaches: Bayesian Networks and Memory-based learning. For the Bayesian ...
متن کاملPredicting is not explaining: targeted learning of the dative alternation
Corpus linguists dig into large-scale collections of texts to better understand the rules governing a given language. We advocate for ambitious corpus linguistics drawing inspiration from the latest developments of semiparametrics for a modern targeted learning. Transgressing discipline-specific borders, we adapt an approach that has proven successful in biostatistics and apply it to the well-t...
متن کاملThe Dative Alternation in African American English : Researching Syntactic Variation and Change in a
Recent research has shown the dative alternation in English to be a productive arena for examining the relationship between group-level variation and the internalization of individuals’ grammars. Experimental methods (e.g., Bresnan and Ford 2010) and the analysis of large published corpora (e.g., Bresnan et al. 2007) have revealed subtle cross-dialect differences for this variable. The current ...
متن کاملThe dative alternation in African American English: Researching syntactic variation and change across sociolinguistic datasets
Recent research has shown the dative alternation in English to be a productive arena for examining the relationship between group-level variation and the internalization of individuals’ grammars. Experimental methods (e.g., Bresnan and Ford 2010) and the analysis of large published corpora (e.g., Bresnan et al. 2007) have revealed subtle cross-dialect differences for this variable. The current ...
متن کامل